Least squares and stochastic difference equations
نویسندگان
چکیده
منابع مشابه
Least-Squares Temporal Difference Learning
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-98-152.) TD( ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD( ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it ...
متن کاملIncremental Least-Squares Temporal Difference Learning
Approximate policy evaluation with linear function approximation is a commonly arising problem in reinforcement learning, usually solved using temporal difference (TD) algorithms. In this paper we introduce a new variant of linear TD learning, called incremental least-squares TD learning, or iLSTD. This method is more data efficient than conventional TD algorithms such as TD(0) and is more comp...
متن کاملKernel Least-Squares Temporal Difference Learning
Kernel methods have attracted many research interests recently since by utilizing Mercer kernels, non-linear and non-parametric versions of conventional supervised or unsupervised learning algorithms can be implemented and usually better generalization abilities can be obtained. However, kernel methods in reinforcement learning have not been popularly studied in the literature. In this paper, w...
متن کاملConstrained Least-Squares Density-Difference Estimation
We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure that first estimates two densities separately and then computes their difference. However, such a two-step procedure does not necessarily work well because the first step is performed without regard to the second step and thus a small error in the first stage can cause...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Econometrics
سال: 1976
ISSN: 0304-4076
DOI: 10.1016/0304-4076(76)90025-7